Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serialized parsers #1815

Merged
merged 16 commits into from
Jun 22, 2023
Merged

serialized parsers #1815

merged 16 commits into from
Jun 22, 2023

Conversation

jurgenvinju
Copy link
Member

  • removed public from all functions in ParseTree
  • added interface for saving and loading parsers from disk without exposing the file internals (could be class files or grammar values, etc.)

@codecov
Copy link

codecov bot commented Jun 20, 2023

Codecov Report

Merging #1815 (01cbace) into main (f2e37a0) will increase coverage by 0%.
The diff coverage is 72%.

@@           Coverage Diff            @@
##              main   #1815    +/-   ##
========================================
  Coverage       48%     48%            
- Complexity    6031    6056    +25     
========================================
  Files          670     670            
  Lines        58541   58660   +119     
  Branches      8536    8546    +10     
========================================
+ Hits         28570   28673   +103     
- Misses       27808   27815     +7     
- Partials      2163    2172     +9     
Impacted Files Coverage Δ
.../org/rascalmpl/test/infrastructure/QuickCheck.java 31% <ø> (ø)
src/org/rascalmpl/values/IRascalValueFactory.java 16% <0%> (-9%) ⬇️
...rc/org/rascalmpl/interpreter/utils/JavaBridge.java 57% <58%> (+<1%) ⬆️
...g/rascalmpl/values/RascalFunctionValueFactory.java 56% <67%> (+1%) ⬆️
src/org/rascalmpl/library/Prelude.java 44% <69%> (+<1%) ⬆️
src/org/rascalmpl/parser/ParserGenerator.java 59% <77%> (+1%) ⬆️
.../org/rascalmpl/interpreter/utils/JavaCompiler.java 64% <84%> (+5%) ⬆️

... and 13 files with indirect coverage changes

@jurgenvinju jurgenvinju marked this pull request as ready for review June 21, 2023 17:25
@jurgenvinju jurgenvinju self-assigned this Jun 22, 2023
@@ -125,7 +125,7 @@
<artifactId>rascal-maven-plugin</artifactId>
<version>${rascal-maven.version}</version>
<configuration>
<errorsAsWarnings>false</errorsAsWarnings>
<errorsAsWarnings>true</errorsAsWarnings>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is necessary due to double presence of rascal.jar during the tutor run. For some reason the old builtins are linked against the new code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have to fix this in the tutor maven-plugin somehow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and errors as warnings fixes this? interesting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well yes, the tutor fails then on the ParseTree module, but the build isn't stopped by this because those errors are reported as warnings only. When I do a full bootstrap, the error goes away again, but I'd rather not have te tutor's own run-time interfere with the run-time of the examples it is running.. very difficult in the case of rascal.jar. I believe it will require JVM module layers to fix this.

@@ -310,7 +310,7 @@ data Symbol
data Symbol // <19>
= \conditional(Symbol symbol, set[Condition] conditions);

public bool subtype(Symbol::\sort(_), Symbol::\adt("Tree", _)) = true;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParseTree is one of the most visible modules in the library; by using public everywhere it is priming our users to do the same, while we decided that public was the default. Over the course of the next months, I'll be removing those modifiers from the library modules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, that's why this sneaked in ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, a guilty pleasure.

} catch (ClassCastException e) {
throw new ImplementationError("parser generator:" + e.getMessage(), e);
} catch (Throw e) {
throw new ImplementationError("parser generator: " + e.getMessage() + e.getTrace());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can only test this by having a serious bug in the parser generator. This should never happen.

rascalValues.storeParsers(start, saveLocation);
}
catch (IOException e) {
throw RuntimeExceptionFactory.io(e.getMessage());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be tested by writing to a non-existing folder or something. however we'd not be testing the actual code that we wrote here, just the URIResolverRegistry.

{
return R<0> + R<1>;
return R.from + R.to;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an accident; I was testing the type-checker for another diagnose run.

Copy link
Member Author

@jurgenvinju jurgenvinju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned up some unrelated changes. otherwise I tested it on several smaller and larger grammars. The speed-up of using a saved parser against loading the generator and generating one is around 1000 times.

Copy link
Member

@DavyLandman DavyLandman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me, nice first step! I think you want to build the concrete syntax on top of this in a new branch?

@@ -125,7 +125,7 @@
<artifactId>rascal-maven-plugin</artifactId>
<version>${rascal-maven.version}</version>
<configuration>
<errorsAsWarnings>false</errorsAsWarnings>
<errorsAsWarnings>true</errorsAsWarnings>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and errors as warnings fixes this? interesting

compile(fileMap, diagnostics);

// side-effect alert:
// now the local classloader contains the .class file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a new side-effect? or was this always a side-effect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was always there, and necessary for the generated sub-classes bytecode to become available when instantiating the parser class object. However, normally we'd not ignore the return class and we would not be reading ourselves directly from the virtual file system later. Definitely not purely functional methods here.

manifest.getMainAttributes().put(Attributes.Name.MAIN_CLASS, qualifiedClassName);

try (JarOutputStream target = new JarOutputStream(output, manifest)) {
for (Entry<String, JavaFileObject> entry : classes.entrySet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this saves all previously compiled classes to the file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but the compiler is organized to be instantiated per run. So you get all the class files generated from compiling one Java source file loaded into one virtual file system that is wrapped by one classloader.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of parser generation, this entails one main file and as many sub-classes as there are non-terminals (including lists etc.) in the grammar.

Copy link
Member Author

@jurgenvinju jurgenvinju Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need to compile several collaborating source files, you can use 1 instance and still save the class files into one comprehensive jar file that will work if you use the load facility here to get the files back.

Note that I have chosen to not use .class file extensions and not change . to / to create a JVM folder structure. The reason is that this encoding is to remain opaque to our users, and maximally fast to deserialize. If we switch parsing technology, we may not even store our parsers as jars or class files.

var file = new JavaFileObjectImpl(className, JavaFileObject.Kind.CLASS);

try (var fo = file.openOutputStream()) {
fo.write(Prelude.consumeInputStream(jarIn));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use a regular copy loop, to avoid loading it all into memory first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but so far this code has proven to be blazingly fast so I thought this is nice and short as it is. We average around 4 ms for storing or reading a medium sized grammar (so that includes reading and writing all byte arrays for the JVM bytecode of all sub-classes and the main parser class.

* after `storeParsers(g, file);`
* then `g = loadParsers(file);`
* and given `h = parsers(g);`
* then for all valid `nt`, `input` and `origin`: `g(nt, input, origin) == g(nt, input, origin)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both side of the == is the exact same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% exact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah. h

}
@pitfalls{
* reifiying types (use of `#`) will trigger the loading of a parser generator anyway. You have to use
this notation for types to avoid that: `type(\start(sort("MySort")), ())` to avoid the computation for `#start[A]`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good tip.

I guess any concrete syntax will still trigger it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any type will trigger it. That's because the translation from concrete syntax types to abstract type symbols is implemented only once in lang::rascal::grammar::definition::Symbols.

@@ -1149,6 +1149,16 @@ public static String consumeInputStream(Reader in) throws IOException {
}
return res.toString();
}

public static byte[] consumeInputStream(InputStream in) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be some places in Prelude we can replace with this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did that before; those were all Readers I believe.

storeParsers(#start[A], |test-temp:///parsers.jar|);
p = loadParsers(|test-temp:///parsers.jar|);

return p(type(\start(sort("A")), ()), "a", |origin:///|) == parse(#start[A], "a", |origin:///|);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test :) can we also save a file somewhere? and load it back in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the question. The test saves the file in |test-temp:///parsers.jar| and then loads it back from that file, assigning the parser function into g. Then it tests if that recovered parser function behaves the same as a freshly generated parser function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this code already has a #start[A] in there. So what if we load a grammar that isn't reified in the code?

Copy link
Member Author

@jurgenvinju jurgenvinju Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent question. I've coded that scenario in the @examples of the documentation, because there I can freely start a fresh REPL that has no declaration memory. I can not do that here unfortunately. Nevertheless, yes that scenario works 100%. It's a bit weird, because you do get parse trees for types that have never been declared. But, it works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason that it works is that the parsers function and the storeParsers/loadParsers function generate functions that take only the type as parameter and ignore the grammar part of the type values. The grammar has already been used by the store function, and the start type is only used to decide which part of the grammar to activate.

@jurgenvinju
Copy link
Member Author

It looks good to me, nice first step! I think you want to build the concrete syntax on top of this in a new branch?

Yes; but I'd like to see first if this does not solve your immediate problem already. I mean it could lead to immediate highlighting in the editor without that additional fix. Of course, it would be even faster if we also parse modules with concrete syntax like this, but perhaps this does not have to be done in such a rush. There are so many ways to do that; I'd like to think a bit about that before committing to a strategy that could lead to future problems.

fixed typo thanks to @DavyLandman
same typo in documentation clone.
@DavyLandman
Copy link
Member

Yes; but I'd like to see first if this does not solve your immediate problem already. I mean it could lead to immediate highlighting in the editor without that additional fix. Of course, it would be even faster if we also parse modules with concrete syntax like this, but perhaps this does not have to be done in such a rush. There are so many ways to do that; I'd like to think a bit about that before committing to a strategy that could lead to future problems.

Well, only if I would change the interface of Rascal-LSP to allow passing in these pre-loaded parsers. Otherwise I still have to wait for the imports to finish before we get the parser handed to rascal-lsp.

@jurgenvinju
Copy link
Member Author

Well, only if I would change the interface of Rascal-LSP to allow passing in these pre-loaded parsers. Otherwise I still have to wait for the imports to finish before we get the parser handed to rascal-lsp.

Right, so the main module that imports util::LanguageServer must also import all the other modules that may depend on concrete syntax and they would trigger loading of the parser generator, even before the language is fully registered. I didn't think of that.

Well. yep. Then this independent feature is now a plateau we can build upon. @PaulKlint It will also help in writing the compiler, because now the compiler can call parser generation at compile time, store the parsers in the target folder, and those parsers can be loaded at run-time again very quickly.

@DavyLandman
Copy link
Member

The original idea of that issue could still work for this? Register language gets an optional argument that you can pass in a serialized parser. Then in rascal-lsp we can preload the contribution with that, and then do the loading of the module in the background as we did before. Just a shorter time until the big wall of white/black text is gone.

@jurgenvinju jurgenvinju merged commit f4d99dc into main Jun 22, 2023
@jurgenvinju jurgenvinju deleted the serialized-parsers branch June 22, 2023 10:12
@jurgenvinju
Copy link
Member Author

jurgenvinju commented Jun 22, 2023

The original idea of that issue could still work for this? Register language gets an optional argument that you can pass in a serialized parser. Then in rascal-lsp we can preload the contribution with that, and then do the loading of the module in the background as we did before. Just a shorter time until the big wall of white/black text is gone.

I have to think better about this. As it stands that solution does not work. You need more than the serialized parser to produce a parse tree; first of all also the top-nonterminal. If we'd pass a function to resolve that issue we'd run into the previous loading issue, so that solution would require yet another field in Language.

Another hesitation is that we've abstracted parser as a function for good reasons. Sometimes parsing requires pre-processing and post-processing, and has side-effects on the IDE's other functions. The more we steer towards "static" parsers, the more penalty for languages that do not fit into that box.

For that reason I'd rather have the modules load fast, and we can keep the interface clean. So I'd like to think about that for a little longer.

@DavyLandman
Copy link
Member

Okay that makes sense, I would also appreciate an approach that doesn't require all kinds of bespoke solutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants